Perspective projection

Perspective projection is similar to how humans perceive the world. Unlike orthographic projection, the view volume is a frustum, a pyramid with a clipped top. The tip of the pyramid is the center of projection (COP), the clipped top is the near plane, and the base is the far plane. The view plane is where the image will be recorded and is set between the center of projection and the near plane.

Given an object with \(z\) distance from the COP and a height \(y\) and an image plane with distance \(z_i\) form the COP, then the height \(y_i\)of the projected image of \(y\) can be found using similar triangles.

$$ \frac{y_i}{z_i} = \frac{y}{z}$$$$ y_i = \frac{y}{z}z_i = y\frac{z_i}{z}$$

The depth in the scene controls the size of the final projected image: we want to scale \(y\) by how large \(z\) is. This requires a divide by \(z\).

It is not obvious how to embed such a division in a 3D matrix. We can use the homogeneous coordinate to pass along some depth information. If we require the homogeneous coordinate to always be 1, we can normalize the homogeneous coordinate and force division. Then, for a homogeneous coordinate:

$$ \begin{bmatrix} hx \\ hy \\ hz \\ h \end{bmatrix} = \begin{bmatrix} \frac{hx}{h} \\ \frac{hy}{h} \\ \frac{hz}{h} \\ \frac{h}{h} \end{bmatrix} = \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix}$$

In the 1D example, we multiply by \( \frac{z_i}{z} \) where \(z_i\) is the distance to the image plane. For a 3D frustum, we will act as if the image plane is at \(n\). So, we construct the matrix multiply so the output \(w\) is a function of distance from the center of projection, then homogenize the output vector so \(w\) is again 1.

$$ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & \frac{1}{n} & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ \frac{z}{n} \end{bmatrix} = \begin{bmatrix} x\frac{n}{z} \\ y\frac{n}{z} \\ n \\ 1 \end{bmatrix}$$

With the current projection, the \(z\) values change for all points in the view volume. Forcing the \(z\) values of points on the near and far planes to remain constant is convenient for later transforms. Adjusting the third row of the matrix allows this:

$$ \mathbf{M}_{p}=\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{n+f}{n} & -f \\ 0 & 0 & \frac{1}{n} & 0 \end{bmatrix}$$

So, the general transform of a point is:

$$ \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{n+f}{n} & -f \\ 0 & 0 & \frac{1}{n} & 0 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} x \\ y \\ z(\frac{n+f}{n})-f \\ \frac{z}{n} \end{bmatrix} = \begin{bmatrix} x\frac{n}{z} \\ y\frac{n}{z} \\ n+f-\frac{fn}{z} \\ 1 \end{bmatrix}$$

For points on the near plane, \(z = n\):

$$ \begin{bmatrix} x\frac{n}{n} \\ y\frac{n}{n} \\ n+f-\frac{fn}{n} \\ 1 \end{bmatrix} = \begin{bmatrix} x \\ y \\ n \\ 1 \end{bmatrix}$$

For points on the far plane, \(z = f\):

$$ \begin{bmatrix} x\frac{n}{f} \\ y\frac{n}{f} \\ n+f-\frac{fn}{f} \\ 1 \end{bmatrix} = \begin{bmatrix} x\frac{n}{f} \\ y\frac{n}{f} \\ f \\ 1 \end{bmatrix}$$

This will be the projection matrix we will use. We can make it prettier by scaling by \(n\). This has no effect on the resulting homogeneous coordinates, since they will all be scaled by \(w\) anyway.

$$ \mathbf{M}_{p} = n \begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{n+f}{n} & -f \\ 0 & 0 & \frac{1}{n} & 0 \end{bmatrix} = \begin{bmatrix} n & 0 & 0 & 0 \\ 0 & n & 0 & 0 \\ 0 & 0 & n+f & -fn \\ 0 & 0 & 1 & 0 \end{bmatrix}$$

The current projection matrix converts a view volume with a frustum shape into a view volume with a rectangular prism shape. So, we still need to apply an orthographic projection to move the resulting prism to the origin and scale it correctly:

$$ \mathbf{M}_{projection}= \begin{bmatrix} \frac{2}{r-l} & 0 & 0 & -\frac{r+l}{r-l} \\ 0 & \frac{2}{t-b} & 0 & -\frac{t+b}{t-b} \\ 0 & 0 & \frac{2}{n-f} & -\frac{n+f}{n-f} \\ 0 & 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} n & 0 & 0 & 0 \\ 0 & n & 0 & 0 \\ 0 & 0 & n+f & -fn \\ 0 & 0 & 1 & 0 \end{bmatrix} = \begin{bmatrix} \frac{2n}{r-l} & 0 & \frac{l+r}{l-r} & 0\\ 0 & \frac{2n}{t-b} & \frac{b+t}{b-t} & 0\\ 0 & 0 & \frac{f+n}{n-f} & \frac{2fn}{f-n} \\ 0 & 0 & 1 & 0 \end{bmatrix}$$